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1.  INTRODUCTION 

The  increasing  levels  of  integration  and  miniaturization  in  military  communications  devices 
require  the  co-design  of  antennas  and  other  electromagnetic  elements  with  the  baseband  and  drive 
circuitry.  Field/circuit  simulation  is  an  important  design  prototyping  step  in  research  and 
development  of  such  military  communications  devices.  Time-domain  techniques  have  significant 
advantages  over  the  frequency-domain  techniques  for  modeling  nonlinear  and  time-varying 
components.  However,  no  existing  commercial  EM  simulation  software  can  handle  such 
complexity  of  co-simulation  of  complicated  nonlinear  circuitry  in  the  field  solver.  This  research 
will  produce  the  first  commercial  software  which  can  combine  complex  electromagnetic  structures 
with  nonlinear  and  dynamic  circuitry  in  a  multiscale  manner  by  combining  a  circuit  solver  with  a 
hybrid  EM  solver  suitable  for  all  scales  (fine,  intermediate,  and  coarse  scales).  This  multiscale 
solver  combines  three  efficient  electromagnetic  field  algorithms,  (a)  the  spectral  element 
time-domain  (SETD)  method  for  coarse  scale,  (b)  the  enlarged  cell  technique  (ECT)  for  the 
boundary  conformal  finite-difference  time-domain  method  [i.e.,  the  FDTD  method  improved  to  the 
second  order  in  the  presence  of  curved  conductors]  for  intermediate  scale,  and  (c)  the 
finite-element  time-domain  (FETD)  method  for  fine  scale;  the  field  solver  is  coupled  with  (d)  a 
nonlinear  circuit  solver  based  on  SPICE.  The  multiscale  field  solver  is  highly  efficient  for  any 
mixed-scale  problems. 

1.1.  Identification  and  Significance  of  the  Problem 

The  co-design  of  antennas  (and/or  other  electromagnetic  elements)  with  the  baseband  and 
drive  circuitry  becomes  increasingly  important  in  today’s  military  communications  devices.  A 
highly  accurate  and  efficient  combined  field/circuit  solver  is  essential  for  the  successful  design 
prototyping  of  these  complex  devices.  Electromagnetic  field/circuit  solver  codes  have  been  used  in 
the  microwave  research  community,  but  none  of  the  commercial  software  tools  (such  as  HFSS  of 
Ansoft  or  Microwave  Studio  of  CST)  can  solve  the  combined  field  problem  with  nonlinear  and 
dynamically  changing  circuit  components.  Therefore,  the  first  problem  is  the  development  of  the 
first  commercial  field/circuit  solver  prototype  and  its  feasibility  in  the  simulation  of  realistic 
devices.  This  is  important  because  existing  research  codes  in  this  area  lack  the  user-friendliness 
and  efficiency  for  fast  prototyping. 

The  second  problem  is  the  low  accuracy  and  efficiency  in  the  FDTD  method  used  in 
conventional  field/circuit  solvers  in  research  community.  As  is  now  well  known  (also  pointed  out 
in  [1]  with  a  recent  FDTD/circuit  solver),  the  FDTD  method  suffers  from  the  so-called  staircasing 
errors  because  a  curved  surface  is  approximated  by  a  staircased  surface.  Such  staircasing  errors 
make  the  computed  electromagnetic  fields  near  the  curved  surface  zero-order  accurate  (i.e.,  the 
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accuracy  does  not  improve  as  one  refines  the  mesh),  and  globally  at  best  first-order  accurate 
because  of  super-convergence  [2,3].  Consequently,  if  the  sampling  density  (i.e.,  the  number  of 
points  per  wavelength,  or  PPWs)  is  doubled,  the  global  numerical  error  decreases  only  by  half. 
This  is  in  stark  contrast  with  the  performance  of  the  FDTD  method  in  a  homogeneous  medium 
whose  outer  boundary  aligns  with  the  mesh  -  in  that  case  the  convergence  is  second  order,  i.e.,  if 
the  sampling  density  is  doubled,  the  global  numerical  error  decreases  by  a  factor  of  four.  We  will 
use  our  enlarged  cell  technique  (ECT)  to  address  this  problem. 

The  third  problem  is  the  low  efficiency  in  the  conventional  field  solvers  (and  thus  field/circuit 
solvers)  for  multiscale  problems  simultaneously  with  fine,  intermediate,  and  coarse  scales 
compared  with  the  wavelength.  The  conventional  FDTD  and  FETD  (finite-element  time-domain) 
methods  are  only  feasible  for  small  and  intermediate  scale  problems  because  of  their  lower-order 
convergence.  The  high-order  and  spectral  methods  developed  by  this  team  are  much  more  efficient 
for  coarse-scale  problems  where  the  feature  size  of  device  structure  is  larger  than  a  wavelength. 
In  this  project,  we  will  combine  three  methods,  i.e.,  the  FETD  method  for  the  fine  scale  (where  the 
feature  size  is  smaller  than  0.01  wavelengths),  the  ECT  for  the  intermediate  scale  (where  the 
feature  size  is  between  0.01-2  wavelengths),  and  the  SETD  (spectral  element  time-domain)  method 
for  the  coarse  scale  (where  the  feature  size  is  larger  than  2  wavelengths).  This  combined  field 
solver  is  ideal  for  a  multiscale  problem  where  all  three  scales  coexist;  the  discontinuous  Galerkin 
method  (DGM)  will  be  used  to  combine  these  three  methods  in  the  field/circuit  solver.  Such  a 
multiscale  field/circuit  solver  will  be  especially  advantageous  for  phased  arrays  with  fine  circuitry 
details. 


1.2.  Available  Field/Circuit  Solvers  in  Research  Community 

Currently  available  time-domain  field/circuit  solvers  use  three  different  methods  for  the  field 
simulation:  (a)  the  FDTD  method,  (b)  the  FETD  method,  and  (c)  the  time-domain  integral  equation 
(TDIE)  method.  Sui  et  al.  [4]  introduce  the  FDTD  and  lumped  circuits  in  2D.  The  3-D  FDTD  and 
circuit  simulation  is  developed  by  Piket-May  et  al.  [5].  Thomas  et  al.  [6]  propose  to  use  SPICE 
with  the  FDTD  method  for  field/circuit  simulation.  The  time-consuming  SPICE  approach  is 
avoided  in  the  FDTD/circuit  simulation  through  nonlinear  solver  based  on  the  Newton-Raphson 
algorithm  in  [7-9].  The  FETD  method  has  been  proposed  in  [10-12],  and  more  recently  the  TDIE 
method  has  been  developed  in  [13,  14].  The  SPICE  models  can  be  found  in  [15,  16]  with  the 
Modified  Nodal  Analysis  (MNA)  Circuit  Equations. 

The  major  limitation  of  the  FDTD/circuit  solver  is  its  low  accuracy  due  to  starecasing  errors 
(see  more  discussions  below).  The  FETD  method  is  more  accurate  than  FDTD  method  for  curved 
and  complex  structures,  but  it  is  challenging  for  large-scale  problems  because  a  large  mass  matrix 
has  to  be  inverted.  The  TDIE  method  still  lags  far  behind  in  terms  of  its  computation  efficiency  and 
thus  significant  more  research  has  to  be  done  before  it  becomes  practical. 
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1.3.  Current  Techniques  to  Address  the  Field  Problem 

In  the  computational  electromagnetics  (CEM)  community,  recently  at  least  three  classes  of 
techniques  have  been  developed  to  address  the  large  staircasing  errors  in  the  FDTD  method:  (a)  the 
body  fitted  coordinates  [18],  (b)  fractional  cell,  mixed  boundary  elements,  or  embedded  boundaries 
[2,3,17,19,20],  and  (c)  fully  unstructured  mesh  techniques  [21-28].  The  first  two  methods  [(a) 
and  (b)]  can  achieve  second-order  accuracy,  while  the  third  technique  [(c)]  can  potentially  achieve 
spectral  accuracy  (i.e.,  the  error  decreases  exponentially  with  the  sampling  density).  In  comparison, 
techniques  in  (b)  are  easier  to  implement  than  techniques  in  (a),  and  apply  to  more  complicated 
structures.  Our  numerical  results  confirmed  that  the  ECT  improves  the  FDTD  method  to 
second-order  accuracy.  Furthermore,  at  the  same  5%  accuracy  requirement,  the  ECT  requires 
only  8  PPWs,  while  the  FDTD  method  requires  80  PPWs  (10  times  denser  sampling);  thus,  the 
ECT  is  approximately  1000  times  more  efficient  than  FDTD  for  such  a  problem  in  3D. 

The  techniques  in  class  (c)  above  include  the  discontinuous  Galerkin  method  and  multidomain 
pseudospectral  time-domain  (PSTD)  method.  The  main  advantage  of  these  techniques  is  their 
spectral  accuracy  (thus  very  low  sampling  density)  for  electrically  large  domains  (i.e.,  domains 
with  structures  large  compared  to  the  wavelength).  We  have  developed  both  DGM  [30]  and  PSTD 
methods  [23-28]  and  showed  that  these  methods  require  only  about  3-4  points  per  wavelength. 
On  the  other  hand,  they  are  not  suitable  for  electrically  fine  regions  where  the  geometrical  features 
are  much  smaller  than  wavelength  (<  V*  wavelength),  because  at  this  sub-wavelength  regime 
lower-order  methods  (such  as  the  FDTD  or  its  improved  versions)  are  already  accurate  enough  and 
less  expensive  because  the  methods  in  (c)  have  some  overhead  associated  with  the  interface 
treatment. 

Recently,  we  have  developed  the  spectral-element  time-domain  (SETD)  method  [31]  by  using 
the  Gauss-Lobatto-Legendre  (GLL)  polynomials  as  the  basis  functions  and  GLL  points  as  the 
quadrature  integration  points.  This  yields  a  block-diagonal  mass  matrix,  which  is  much  more 
efficient  than  that  in  the  finite  element  time-domain  method  where  the  mass  matrix  is  not 
block-diagonal.  Compared  to  the  PSTD  method  or  spectral  DGM,  this  SETD  method  does  not  have 
the  overhead  associated  with  the  interface  treatment  for  each  element.  To  treat  problems  with 
different  scales,  we  have  also  incorporated  the  discontinuous  Galerkin  method  to  the  SETD  method 
to  allow  multidomain  treatment  (see  [32]).  Therefore,  the  SETD  method  can  be  viewed  as  an 
improved  method  compared  to  the  multidomain  PSTD  method. 

In  summary,  these  more  advanced  methods  proposed  to  overcome  the  limitations  of  the 
conventional  FDTD  method  have  problem  dependent  advantages  and  disadvantages. 
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1.4.  The  Proposed  Multiscale  Time-Domain  Solver  for  Field/Circuit  Simulation 

In  view  of  the  advantages  and  disadvantages  of  the  above  new  techniques,  we  proposed  to 
develop  a  hybrid  method  that  combines  (a)  the  SETD  method  (Lee  and  Liu  [31,32])  with  (b)  the 
second-order  accurate  enlarged  cell  technique  ([17,29]),  and  (c)  the  LETD  method  for  very  fine 
details.  Specifically,  the  SETD  method  will  be  used  in  electrically  large  regions  so  that  the 
sampling  density  there  can  be  greatly  reduced,  while  the  ECT  method  will  be  used  in  regions  that 
have  sub-wavelength  fine  geometrical  features.  The  FETD  method  is  for  even  finer  structures 
where  the  explicit  time  integration  in  ECT  is  not  efficient.  The  interface  between  different  regions 
will  be  treated  by  the  discontinuous  Galerkin  method  where  the  flux  is  correctly  updated  to  ensure 
stability. 

For  any  given  field/circuit  problem,  one  can  divide  the  problem  geometry  into  two  classes,  (a) 
one  with  sub-wavelength  fine  structures  where  the  ECT  (boundary-conformal  FDTD)  method  will 
be  utilized,  and  (b)  one  without  sub-wavelength  structures  where  the  SETD  method  will  be  used  to 
allow  coarse  sampling.  The  interface  conditions  will  be  treated  by  correctly  accounting  for 
electromagnetic  fluxes  with  the  discontinuous  Galerkin  approach  (Riemann  solver). 
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2.  SIMULATION  METHODS 

In  this  section,  we  will  give  all  simulation  methods  for  the  multiscale  problems.  We  will  first 
present  the  formulation  of  the  SETD  and  FETD.  Next,  Multi-Domain  ECT  and  Multi-Circuit  will 
be  presented.  And  then  the  hybridization  of  SETD,  FETD,  and  ECT  will  be  followed. 

2.1.  SETD  and  FETD  Methods 

In  this  section,  we  will  give  a  brief  introduction  to  the  multiscale  hybrid  method  combining  the 
SETD  and  FETD  methods.  The  governing  equations  are  Maxwell’s  equations,  as  shown  in  Eq.  (2.1) 

and  (2.2),  where  the  electric  field  E  and  magnetic  field  H  are  normalized  by  ^  —  and 

H  =  H/v^. 


6r 


dE 

dt 


c0V  x  H 


(2.1) 


fir 


<9H 

dt 


— cqV  x  E 


(2.2) 


2.1.1.  SETD  Method 


The  spectral  element  time-domain  (SETD)  method  is  an  efficient  and  accurate  high-order 
method  allowing  coarse  sampling  for  electrically-large  structures.  In  this  method,  Maxwell’s 
equations  are  discretized  as 


d  f  ~ 

er  —  /  E  ■  $>i  dil 
dt 


co  /  H  •  (V  x  dfl 

J  Q 

/  E  dn - l—  [ 
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+co  /  (n  x  H)  •  ds 
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Js  •  dQ, 


(2.3) 
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p,r—  [  H  •  \l>i  dQ,  =  -c0  [  E  •  (V  x  *!>*)  dQ, 

dt  Jn  Jn 

—Co  /  (n  x  E)  •  ds 

JdQ 


(2.4) 


where  and  are  basis  functions  based  on  the  Gauss-Lobatto-Legendre  polynomials  for  E 
and  H,  respectively. 

To  model  an  arbitrary  hexahedron  element,  a  reference  element  of  cube  is  usually  introduced. 
The  mapping  between  the  hexahedron  and  the  reference  cube  is  as  following 

$  =  j-14 


Vx$  =  tJ  V  x  $ 


(2.5) 


where  $  and  $  represent  the  basis  functions  in  the  hexahedron  element  and  in  the  reference 
element,  respectively.  And 


)  4K\0 


(2.6) 


and  J  is  the  Jacobian  matrix  defined  as 


dx 
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dz  " 

di 
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dx 
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2.1.2.  FETD  Method 


The  finite  element  time-domain  (FETD)  method  is  an  efficient  method  for  complex  electrically 
fine  structures.  Its  discretization  of  Maxwell’s  equation  is  similar  to  the  SETD  method.  For  example, 
the  equation  to  update  E  is  shown  in  Eq.  (2.8). 
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The  differences  between  the  FETD  and  SETD  methods  come  from  the  basis  and  test  functions 
and  the  element  shape,  where  tetrahedron  may  also  be  considered.  In  the  FETD  method, 


lNJe  tvt^  ~  ~ 

mixed-order  basis  functions  are  considered,  such  as  and  for  E  and  H,  respectively.  For 
example,  the  vector  basis  functions  for  hexehedron  element  in  FETD  are  defined  as 


^  e  f  4>n  \f})  \C)  £ 

N  =  <  <pm\v)  4>n\v)  4>{v\Q  v 
[  <f&\0  <t>n\v)  0PO)(O  C 

a  (  4>m\0  4>p\C)  £ 

N  =  {  <0{v)  (/>n\rj)  0p2)(O  v 
(  0m}(C)  <j>n\ri)  <j>p\0  C 


The  discontinuous  Galerkin  (DG)  method  is  employed  to  hybridize  two  different  methods 
across  an  interface.  Several  kinds  of  DG  operators  can  be  used  for  the  boundary  integral  items  over 
interfaces  between  different  sub-domains.  Each  DG  operator  has  its  own  advantages  and 
disadvantages.  Here,  we  demonstrate  how  to  communicate  the  two  FETD  and  SETD  domain  by  the 
center  flux  in  a  DG  method,  the  two  boundary  integral  items  on  an  interface  will  be  determined  by 
the  average  of  field  values  of  two  sub-domains  adjacent  to  the  interfaces,  i.e. 


f  N{‘}  ■  h{,)  x  HdS  =  -  f  JV<0  •  h(i)  x  Hu>dS  +  -  f  N®  ■  h(i)  x  H  dS  (2.11) 

JS  6  2 2 

£  N(hl)  ■n(i)xEdS  =  ^sN(hi) -n{i)  xE(i)dS  +  Vf  • n(i)xE+dS  (2.12) 


where  and  fields  are  from  the  local  sub-domain,  viz.  the  zth  sub-domain,  and  E+ 

and  H+  are  from  neighbor  sub-domains. 


2.1.3.  Numerical  Examples 

The  first  example  is  a  simple  metallic  cavity  case.  It  is  used  to  verify  our  FETD  code.  A 
TE101  mode  is  excited  and  the  Ez  fields  are  recorded  at  an  observation  point.  The  results  are 
compared  with  FDTD  results,  as  shown  in  Fig.  2.1. 
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Iteraion 


Fig.  2.1:  A  metallic  cavity  excited  by  TE101  mode. 


The  second  example  involves  a  dielectric,  which  is  impinged  by  a  plane  wave.  The  geometry 
is  shown  in  Fig.  2.2  Left.  And  the  SETD  results  are  shown  in  Fig.  2.2  Right.  The  comparison  with 
the  ECT  results  is  used  to  verify  our  SETD  code. 


Fig.  2.2:  Left: 
point. 


a  dielectric  is  impinged  by 


a  plane  wave.  Right:  Ez 


fields  recorded  at 


an  observation 
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2.2.  Multi-Domain  ECT  Engine 

The  ECT  (Enlarged  Cell  Technique)  engine  is  used  for  the  field  simulation  in  our  software.  It 
is  an  improved  conformal  FDTD  simulation  solver  without  the  need  to  reduce  the  time  step  for 
irregular  cells.  As  is  well  known,  the  FDTD  solver  can  be  easily  modified  for  parallel  computation 
by  dividing  its  total  computational  domain  into  a  number  of  sub-domains.  And  so  is  the  ECT 
engine. 

2.2.1.  Multi-Domain  Methodology 

Here  we  briefly  discuss  about  our  methodology  to  divide  the  computational  domain  and  to 
communicate  among  adjacent  sub-domains.  Several  issues  have  to  be  considered:  domain  partition, 
domain  communication  and  element  locating.  For  the  domain  partition,  we  choose  a  simple  but 
practical  and  robust  strategy.  For  our  multi-domain  computation,  the  total  computational  domain  is 
divided  into  a  number  of  sub-domain  segments  in  each  of  the  three  Cartesian  directions  in  a 
structured  manner  with  Nx  x  Ny  x  Nz  sub-domains.  For  example,  the  total  domain  is  divided  into  3 
x  4  x  5  sub-domains,  with  3  domain  segments  in  the  x  direction,  4  in  the  y  direction,  and  5  in  the  z 
direction.  All  sub-domains  are  aligned  to  their  adjacent  domains  horizontally  or  vertically.  Such  a 
structured  layout  of  sub-domains  in  the  regular  partition  makes  the  bookkeeping  of  sub-domains 
very  simple  and  efficient,  because  an  efficient  data  structure  of  3-D  array  can  be  used  to  record  all 
the  sub-domains.  For  example,  the  array  element  (2,  3,  4)  can  be  used  to  denote  the  sub-domain 
which  is  at  the  2nd  place  in  the  x  direction,  3rd  place  in  the  y  direction  and  4th  place  in  the  z 
direction.  In  addition,  each  sub-domain  has  its  own  memory  allocation  of  its  field  components  and 
supporting  data  for  an  efficient  use  of  memory  and  a  good  performance  of  computational  time. 

For  the  sub-domain  communication,  we  have  to  consider  a  specialty  of  the  staggered  grid  used 
in  the  ECT  engine,  where  E  field  components  and  H  field  components  are  displaced  by  half  a  cell, 
as  shown  in  Fig.  2.3. 
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are  displaced  by  half  a  cell. 


A  symmetric  scheme  is  constructed  for  the  sub-domain  communication.  We  assume  the  left 
walls  are  E  walls,  and  right  walls  are  H  walls.  In  other  words,  on  the  left  surface  of  the  sub-domain, 
the  E  tangential  components  are  located,  while  on  the  right  surface  of  the  sub-domain,  the  H 
tangential  components  are  located,  as  shown  in  Fig.  2.3.  And  the  tangential  fields  on  the  boundary 
walls  will  be  obtained  by  transferring  from  its  adjacent  sub-domains.  An  elemental  loop  of  the  time 
stepping  including  the  adjacent  sub-domain  communication  is  as  follows: 

(1)  Update  the  E  field  in  each  sub-domain. 

(2)  Transfer  the  E  field  from  the  left  sub-domains  to  the  right  sub-domains. 

(3)  Update  the  H  field  in  each  sub-domain. 

(4)  Transfer  the  H  field  from  the  right  sub-domains  to  the  left  sub-domains. 

To  locate  an  element,  such  as  a  source,  an  observer,  and  so  on,  we  have  to  find  first  in  which 
sub-domain  it  is  and  then  find  in  which  cell  it  is  in  that  sub-domain. 

2.2.2.  Numerical  Example 

A  patch  antenna  as  shown  in  Fig.  2.4  is  used  here  to  verify  our  development  of  multi-domain 
ECT  engine.  It  is  fed  by  a  transmission  line  of  characteristic  impedance  of  50  ohm.  The  total 
domain  is  divided  into  4  sub-domains  along  the  y  direction.  The  scattering  parameter  Sll  is 
calculated.  And  its  result  is  plotted  in  Fig.  2.5.  To  verify  the  result,  a  single  domain  simulation  is 


14 


Wave  Computation  Technologies,  Inc.,  Contract  W911NF-09-C-0159 


also  performed.  We  can  easily  see  that  the  two  results  agree  very  well.  This  confirms  that  the 
multidomain  ECT  method  works  well  for  antenna  problems. 


Fig.  2.5:  Comparison  of  Sll  parameter  results  for  a  multi-domain  and  a  single  domain  simulation  for 
the  patch  antenna.  The  two  results  completely  overlap. 
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2.3.  Multi-Circuit  Simulation 

To  extend  the  application  domain  of  our  software  in  circuit/field  co-simulation,  we  have  added 
a  new  feature  to  support  multiple  circuits,  especially  multiple  SPICE  circuits. 

2.3.1.  Scheme  to  Support  Multiple  Circuits 

First,  we  will  introduce  our  scheme  to  support  multiple  circuits.  Two  circuit  solvers  are 
provided  in  our  software  for  the  field/circuit  co-simulation:  the  internal  circuit  solver  for  some 
simple  circuit  elements  and  the  SPICE  circuit  solver  for  arbitrarily  complex  circuits.  For  the 
internal  circuit  solver,  there  is  no  difficulty  to  support  multiple  circuits  as  long  as  two  circuit 
elements  are  not  defined  at  the  same  position.  However,  for  the  SPICE  circuit  solver,  an  extra  work 
is  required  to  support  multiple  circuits  since  the  third-party  SPICE  library  is  employed  in  our 
package.  Currently,  a  simple  way  is  applied,  which  solve  the  multiple  circuits  sequentially  at  each 
time  step  of  the  ECT.  The  Norton  equivalent  circuit  method  is  used  to  link  the  SPICE  and  ECT  cell. 
At  each  time  step,  the  updating  scheme  is  shown  as  follows. 

1.  The  current  IN  for  each  circuit,  as  shown  in  Fig.  2.6,  is  computed  at  time  step  n+1/2  using 
the  H  field  circulated  around  the  cell  containing  the  device  port. 

2.  The  SPICE  circuit  solver  is  used  to  model  the  circuits  one  by  one  with  INn+1/2  obtained  in 
step  1  as  an  excitation.  This  step  updates  Vdev11  to  VdeVn+\  thereby  En+1  components  at  the 
device  port  is  obtained. 

3.  Use  the  normal  ECT  scheme  to  update  the  E  field  elsewhere. 

4.  Use  the  normal  ECT  scheme  to  update  the  H  field  from  n+1/2  time  step  to  n+3/2  time  step. 


—3 - 

+ 

Vdev(t) 

— O - - - 


Fig.  2.6:  The  i-th  circuit  device  embedded  in  a  cell  of  the  ECT  grid  and  the  Norton  equivalent  circuit 
looking  into  the  ECT  grid  from  this  port. 


Embedded  Circuit 
Device 


2.3.2.  Numerical  Example 

To  verify  the  scheme  of  multi-circuit  simulation  a  radio  frequency  signal  amplifier,  as  show  in 
Fig  2.7,  is  modeled  by  a  SPICE  circuit.  The  basic  geometry  is  shown  in  Fig.  2.7,  where  a  patch 
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antenna  is  used  to  receive  electromagnetic  signal.  And  the  signal  is  transmitted  by  a  coaxial  cable. 
An  amplification  circuit  is  used  to  connect  the  inner  and  outer  conductor  of  the  coaxial  cable  to 
amplify  the  transmitted  signal,  as  shown  in  Fig.  2.8. 


Fig.  2.7:  The  basic  geometry  of  the  radio  frequency  signal  amplifier.  It  includes  a  patch  antenna  to 
receive  an  electromagnetic  wave  and  a  coaxial  cable  to  transmit  the  received  signals. 


Fig.  2.8:  Left:  An  amplification  magnifier  is  built  to  connect  the  inner  and  outer  conductor  of  the 
transmitted  coaxial  cable.  Right:  the  amplification  circuit  with  vccl  =  cc2  =  10  V.  Cl  and  C2  are 
coupling  capacitors  with  a  same  value  of  1  n  F.  R1  and  R2  are  DC  bias  resistors  with  a  same  value  of  1 
M  Q  .  R3  =  35k  Q  ,  R4  =  100M  Q  .  The  MOSFET  is  N  channel  BSIM2. 
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A  plane  wave  of  frequency  2.4  GHz  and  amplitude  10  V/m  is  used  to  model  the 
electromagnetic  signal.  The  voltage  received  at  the  coax  cable  is  recorded.  Two  results  are 
compared  with  and  without  the  using  of  the  amplification  circuit.  We  can  clearly  see  from  Fig.  2.9 
that  the  effect  of  the  amplification. 


Fig.  2.9:  Voltages  received  with  and  without  the  amplification  circuit.  The  blue  curve  is  the  input 
signal  to  the  amplification  circuit,  which  is  in  fact  the  signal  received  without  the  use  of  the 
amplification  circuit. 

2.4.  Hybrid  SETD/FETD  Method 

2.4.1.  Formulations 

Field-circuit  applications  involve  multiple  scales  and  complex  geometries.  Usually  the  circuit 
regions  have  electrically  small  scales  and  complex  geometries,  while  the  field  propagation  regions 
have  electrically  large  scales  and  simple  geometries.  A  hybrid  method,  when  set  up  appropriately, 
can  significantly  outperform  any  individual  methodology  for  a  complex  multiscale  problem. 
Considering  the  geometrical  features  of  field-circuit  applications,  we  developed  a  hybrid 
SETD/FETD  method  for  their  simulation.  To  apply  this  hybrid  method,  the  computational  domain 
is  divided  into  a  number  of  sub-domains  according  to  their  electrical  scale  properties.  The  SETD 
method  is  assigned  to  electrically  large  sub-domains  to  allow  a  very  coarse  sampling  for  efficiency; 
while  the  FETD  method  is  assigned  to  electrically  small  sub-domains  with  fine  and  complex 
structures.  Structured  cuboid  meshes  are  used  in  the  SETD  sub-domains;  while  unstructured 
tetrahedron  meshes  are  used  in  the  FETD  sub-domains.  The  most  important  advantage  of  our 
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hybrid  method  is  that  the  meshes  in  the  different  sub-domains  can  be  generated  independently.  This 
alleviates  the  mesh  generation  difficulties  with  large  multiscale  problems  in  the  finite  element 
method  (FEM). 

The  discontinuous  Galerkin  (DG)  method  is  employed  to  coupling  two  different  methods 
across  an  interface.  It  basically  implements  the  surface  integration  in  the  weak  forms  of  Maxwell’s 
equations  in  the  FEM  context  by  using  fields  in  both  side  of  the  interface.  The  weak  forms  of 
Maxwell’s  equations  are  shown  in  Eq.  (2.13)  and  (2.14): 


er  f  E  •  dCl  =  c0  f  H  •  (' V 
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tir—  [  H  •  dfi  =  -c0  [  E  •  (V  x  d/j)  dn 

dt  Jn  Jn 

—Co  /  (n  x  E)  •  ds 
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And  Riemann  solver  Eq.  (2.15)  and  (2.16)  are  used  to  couple  the  fields  on  both  sides  of  the 
interface: 
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The  surface  integration  between  two  different  sub-domains  requires  an  analysis  of  the  element 
shape  relationship  when  a  non-conformal  (non-matched)  mesh  exists  at  the  interface.  For  example,  on 
an  interface  of  an  SETD  sub-domain  and  a  FETD  sub-domain,  the  relation  of  a  rectangle  element  and  a 
triangle  element  has  to  be  considered.  The  triangle  and  rectangle  elements  have  many  geometric 
relations.  We  have  successfully  analyzed  the  elemental  shape  relations  in  all  kinds  of  situations. 
Integration  over  the  shared  area  of  polygon  is  required  in  the  hybrid  SETD/FETD  method.  To 
integrate,  we  split  the  polygon  into  several  triangles,  because  the  Gaussian  quadrature  over  a 
triangle  is  well  known,  and  can  be  used  to  perform  the  numerical  integration  readily. 
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2.4.2.  Numerical  Example 

The  example  is  a  plane  wave  incident  on  two  distant  scatterers.  One  is  a  PEC  sphere  located  at 
the  left-bottom-front  corner,  while  the  other  is  a  PEC  cylinder  located  at  the  right-top-back  corner. 
The  two  scatterers  are  purposely  placed  far  away  to  simulate  a  multi-scale  situation.  The  sizes  of 
the  scatters  are  around  half  of  a  wavelength.  The  distance  between  them  is  about  3  wavelengths. 
FETD  sub-domains  are  assigned  around  the  small  scatterers,  as  shown  in  Fig.  2.10,  while  the  other 
space  is  divided  into  SETD  sub-domains. 


z 

O 

_x_ 

Fig.  2.10:  The  total  computational  domain  is  divided  into  125  sub-domains  with  5  partitions  in  each 
direction.  Two  FETD  sub-domains  are  assigned  around  the  two  small  scatters,  while  the  others  are 
SETD  sub-domains.  PML  is  used  to  absorb  outgoing  waves  at  the  computational  edge. 


Fig.  2.11:  The  tetrahedron  meshes  of  the  two  FETD  sub-domains.  The  left  is  for  the  left-lower-front 
FETD  sub-domain  with  a  PEC  sphere  scatterer.  The  right  is  for  the  right-top-back  FETD  sub-domain 
with  a  PEC  cylinder  scatterer. 
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The  tetrahedron  meshes  of  the  two  FETD  sub-domains  are  shown  in  Fig.  2.11.  A  plane  wave 
propagating  along  the  X  direction  is  incident  on  the  two  scatterers.  The  electric  field  of  the  plane 
wave  is  polarized  along  the  Z  direction.  Two  observers  are  placed  to  record  the  Ez  field  component. 
The  results  are  compared  with  the  FDTD  results.  Again,  excellent  agreement  is  obtained  as  shown 
in  Fig.  2.12. 
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Fig.  2.12:  The  Ez  field  at  the  two  observers  located  at  (0.5,  1.5,  2.5)  m  and  (2.5,  1.5,  2.5)  m.  Excellent 
agreement  is  obtained  by  comparing  with  the  reference  FDTD  results. 


2.5.  Field-Circuit  Applications 

We  have  demonstrated  some  field-circuit  applications  with  our  software.  The  applications 
include:  (1)  single  stage  amplifier;  (2)  multiple  (2  and  3)  stage  amplifiers.  The  first  application  is 
from  the  paper  [33]  so  that  we  can  compare  our  results  with  those  reference  results.  Good 
agreement  of  our  results  to  those  in  the  reference  paper  is  obtained.  It  shows  that  our  software  can 
successfully  simulate  such  complex  field-circuit  applications. 

2.5.1.  Single  Stage  Amplifier 

The  first  application  is  a  single  stage  microwave  FET  amplifier.  The  simulation  structure 
consists  of  a  microwave  FET  and  a  microstrip  line.  Fig.  2.13  shows  the  configuration  of  the 
microwave  FET  amplifier  in  WCT  modeling  environment.  The  gray  box  is  the  FDTD  simulation 
area.  Details  of  geometry  parameters  of  the  active  device  are  given  in  Fig.  2.14,  where  the 
amplifier  is  connected  between  points  A  and  B. 
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Fig.  2.13:  The  perspective  view  in  Wavenology  EM  for  a  microwave  amplifier  connected  to  a 
microstrip  line. 


Fig.  2.14:  Detail  parameters  of  the  microstrip  and  a  linear  microwave  amplifier  in  Fig.  2.13. 


The  small  signal  equivalent  circuit  of  the  FET  is  given  in  Fig.  2.15. 


Fig.  2.15:  The  small  signal  equivalent  circuit  of  a  common-source  microwave  FET. 
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We  use  the  cell  size  in  the  X  and  Y  directions  as  =  0.1  mm  and  Ay  -  mm.  In  the  Z 

direction,  we  have  five  computation  cells  for  the  dielectric  substrate  since  Wavenology  EM  uses  a 
nonuniform  gridding  scheme.  The  time  step  increment  is  0.09  ps,  which  is  smaller  than  the 
reference  value  of  0.16  ps  in  [33].  The  dielectric  constant  of  the  substrate  is  2.17.  A  BHW  pulse 
time  function  is  applied  at  Lumped  Port  1  on  the  left  end  of  the  microstrip  line.  The  observation 
port  is  Lumped  Port  2  on  the  right  end  of  the  microstrip  line. 

The  simulated  S-parameters,  Sll  and  S21,  of  the  active  microwave  amplifier  circuit  are  shown 
in  Figs.  2.17  and  2.18,  respectively,  from  1  GHz  to  15  GHz.  The  results  from  FDTD  [33]  and 
ADS  Agilent’s  Advanced  Design  System  (ADS)  were  provided  for  comparison.  Good  agreement 
is  achieved  between  the  method  from  FDTD  [33]  and  Wavenology  EM.  The  small  difference 
from  ADS  is  due  to  the  lack  of  full  wave  analysis  capability  of  ADS. 


Fig.  2.17:  Magnitude  of  the  Sll  of  the  amplifier  circuit  in  Fig.  2.13  compared  with  the  FDTD  [1]  and 
ADS  results. 
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Fig.  2.18:  Magnitude  of  the  S21  of  the  amplifier  circuit  in  Fig.  2.13  compared  with  the  FDTD  [33] 
and  ADS. 


2.5.2.  Multiple  Stage  Amplifier 


To  demonstrate  our  software’s  capabilities  in  simulating  complex  multiple-stage  amplifiers  in 
particular,  and  multiple  circuits  in  general,  the  two-  and  three-stage  amplifier  cases,  as  shown  in 
Fig.  2.19  and  2.20,  are  considered. 
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Fig.  2.20:  The  small  signal  equivalent  circuit  for  a  three-stage  amplifier  configuration. 


Figs.  2.19  and  2.20  give  the  small-signal  equivalent  circuit  models  for  the  two-  and 
three-stage  amplifiers,  which  are  used  to  replace  the  single-stage  amplifier  in  Fig.  2.13.  Fig.  2.21 
and  42  show  the  comparison  of  magnitude  of  Sll  for  these  two  cases  separately.  As  seen  from  Figs. 
2.21  and  2.22,  Wavenology  EM  successfully  simulated  the  hybrid  field-circuit  problem  with  the 
multi-stage  microwave  FET  amplifier  embedded  in  the  gap  of  a  microstrip  line. 


Fig.  2.21:  Magnitude  of  the  Sll  of  the  amplifier  circuit  with  one,  two,  and  three  stages  in  Fig.  2.13. 
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Fig.  2.22:  Magnitude  of  the  S21  of  the  amplifier  circuit  with  one,  two,  and  three  stages  in  Fig.  2.13. 


2.6.  Hybrid  SETD/FETD/ECT  Method 

2.6.1.  Motivation  to  integrate  the  ECT  method  into  the  hybrid  method 

To  apply  this  hybrid  method,  the  whole  computational  domain  is  divided  into  a  number  of 
sub-domains  according  to  their  electrical  scale  properties.  The  SETD  method  is  assigned  to 
electrically  large  sub-domains;  while  the  FETD  method  to  electrically  small  sub-domains  with  fine 
structures.  To  increase  the  capability  of  the  hybrid  method  in  dealing  with  more  complex 
multiscale  field/circuit  problems,  we  now  further  combine  it  with  the  enlarged  cell  technique 
(ECT).  The  ECT  method  is  an  improved  conformal  FDTD  method  which  does  not  require  a  time 
step  reduction  caused  by  the  small  irregular  cells  around  metallic  boundaries.  The  ECT  method  is 
well  suited  to  model  electrically  intermediate  sub-domains  because  of  the  efficiency  in  the  FDTD 
method.  Another  reason  to  add  the  ECT  method  in  our  hybrid  method  is  that  the  circuit  components 
can  be  easily  and  efficiently  implemented  and  embedded  in  the  FDTD  method.  Therefore,  one 
major  part  of  this  work  during  this  reporting  period  is  to  incorporate  the  ECT  with  our 
SEDT/FETD  method. 

2.6.2.  Hybridization  of  SETD  and  ECT  Methods 

Although  the  idea  of  hybridization  is  simple,  to  build  a  robust  algorithm  that  is  stable  and 
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accurate  is  quite  challenging  because  of  the  different  characteristics  of  different  methods.  In  our 
extensive  research,  we  found  that  being  free  of  spurious  solutions  is  one  of  the  most  important 
prerequisites  for  the  hybrid  method.  To  achieve  this  goal,  a  one-cell  layer  of  buffer  zone  of  an 
FETD  subdomain  [38]  is  used  to  bridge  the  two  totally  different  methods:  the  SETD  method  and 
the  ECT  method.  The  idea  to  use  the  FETD  buffer  comes  from  the  both  closeness  of  the  FETD  with 
the  ECT  and  the  FETD  with  the  SETD.  In  other  words,  the  FETD  method  serves  as  an  effective 
bridge  between  the  ECT  and  SETD  methods.  The  FETD  buffer  shares  a  one-cell  layer  with  the  ECT 
domain.  Moreover,  the  E  nodes  in  the  shared  layer  of  the  FETD  and  the  ECT  are  at  exactly  the 
same  position.  The  coupling  of  the  FETD  and  SETD  is  imposed  through  the  discontinuous 
Galerkin  operations.  Vector  basis  functions  are  employed  for  both  FETD  and  SETD  to  avoid 
spurious  solutions.  A  schematic  diagram  for  the  hybridization  of  the  SETD  and  ECT  method 
through  the  FETD  buffer  is  shown  in  Fig.  2.22  -  2.24. 
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Fig.  2.22:  A  one-cell  buffer  zone  of  FETD  is  used  to  bridge  the  SETD  domain  and  the  ECT  domain. 
The  ECT  domain  and  the  FETD  buffer  are  overlapped  by  one-cell  layer. 
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Fig.  2.23:  The  hybrid  SETD/FETD  simulation  in  the  shaded  region.  After  obtaining  the  boundary 
values  of  electric  fields  at  boundary  II  from  the  ECT  domain,  the  shaded  region  is  simulated  with  the 
hybrid  SETD/FETD  method. 


E  Fields  at  boundary 
I  are  provided  by 
SETD 


E  Fields  at  ; 
boundary  II  are 
sent  to  FETD  + 
SETD 


SETD 


Fig.  2.24:  The  ECT  simulation  in  the  shaded  region.  After  obtaining  the  boundary  values  of  electric 
fields  at  boundary  I  from  the  SETD  +  FETD  domain,  the  shaded  region  is  simulated  with  the  ECT 
method. 
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3.  Parallel  Computation  by  Multiple  Threads 

The  simulation  method  we  used  is  a  hybrid  multidomain  method,  in  which  the  total  spatial 
domain  is  divided  into  a  number  of  subdomains.  Each  subdomain  can  be  assigned  to  a  different 
method  according  to  its  particular  geometric  characteristics.  Three  methods  are  available: 

spectral-element  time-domain  (SETD)  method,  finite-element  time-domain  (FETD)  method,  and 

enlarged  cell  technique  (ECT,  an  improved  conformal  FDTD  method).  The  multidomain  method  is 
well  suited  to  parallel  computation.  We  have  implemented  the  parallel  computing  for  the 
multidomain  method  on  a  shared  memory  computing  system  with  multiple  CPU  cores,  in  which  the 
updating  of  the  fields  of  the  subdomains  is  assigned  to  a  number  of  cores  by  using  multithread 
technique.  This  section  introduces  our  parallel  implementation  for  the  hybrid  multidomain  method 
in  3  aspects:  the  multidomain  parallelization,  the  multidomain  balance,  and  the  performance 
speedup. 

3.1.  Introduction  of  Parallelization 

We  introduce  the  parallel  computation  and  some  important  issues  in  our  designing  of  the 
parallel  program  for  the  joint  field-circuit  solver.  For  parallel  computing,  there  are  4  types  of 
parallelism.  According  to  the  grain,  from  coarse  to  fine,  they  are: 

1.  Task  parallelism  -  divides  the  problem  task  into  several  subtasks  and  execute  them 
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simultaneously.  In  this  situation,  the  tasks  sometimes  need  to  exchange  data.  Therefore,  a  task 
manager  is  required  to  synchronize  the  tasks. 

2.  Data  parallelism  -  separates  the  independent  data  into  groups,  then  executes  them 
simultaneously.  In  this  parallelism,  the  processing  unit  can  be  a  computing  node,  a  CPU  core  or 
the  multiplier  in  CPU. 

3.  Instruction  parallelism  -  reorders  the  instructions  and  makes  them  capable  of  executing 
simultaneously  through  a  processor  instruction  pipeline. 

4.  Bit  parallelism  -  how  many  bits  can  be  executed  simultaneously,  for  example,  32-bit  or 
64-bit  system. 

Item  4  above  is  hardware  bus  bit-width.  Item  3  depends  on  the  processor  architecture  and  the 
compiler  ability.  Item  2  depends  on  the  computing  algorithm  and  the  compiler  ability.  Item  1 
depends  on  how  to  divide  the  algorithm  into  subtasks,  and  how  to  execute  and  synchronize 
subtasks  on  different  computing  nodes  simultaneously.  Currently,  we  are  only  interested  in  the 
parallelism  in  items  1  and  2. 

For  the  task  parallelism,  it  can  be  further  divided  into  two  different  groups  as  follows 
according  to  the  implementation  method. 

1.  Multiple-threads  on  multiple-cores  in  a  single  computer,  in  which  each  subtask  is  a 
thread.  The  subtasks  are  executed  on  different  CPU  of  the  computer.  This  is  the  shared 
memory  architecture.  The  balance  of  CPU  and  memory  are  controlled  by  the  operating 
system  (OS). 

2.  Distributed  computing,  in  which  each  subtask  will  be  sent  to  a  core  (local  or  remote)  and 
executed.  The  progresses  of  those  subtasks  (or  pass  data  back)  are  reported  to  a  controller. 
The  balance  of  program  is  controlled  by  the  algorithm  designer. 


We  have  implemented  the  parallelism  method  by  multiple-threads  on  multiple  cores.  The 
procedure  of  our  implementation  is  as  follows.  Firstly,  the  computational  domain  will  be  split  into 
several  sub-domains.  Then  N  threads  are  created.  All  the  threads  are  managed  by  a  thread  pool,  as 
shown  in  Fig.  3.1,  there  are  4  subdomains  and  4  threads  (N=4). 
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Fig.  3.1:  A  parallelism  structure  by  using  the  multi-domain  hybrid  method. 

At  every  time  step  in  the  joint  field-circuit  solver,  we  execute  each  thread,  i.e.,  update  the 
sub-domain,  by  one  step.  The  maximum  working  (running  simultaneously)  thread  number  is 
decided  by  the  maximum  value  between  the  user-defined  core  number  and  computer  core  number. 
When  a  thread  finishes,  it  reports  to  the  thread  pool  and  pauses  to  let  thread  pool  start  a  waiting 
thread.  After  all  threads  finish,  the  thread  pool  will  exchange  data  among  all  threads  and  record 
data,  and  additionally,  report  to  application  if  necessary.  Then  all  threads  resume  to  the  next  step. 


3.2.  Multidomain  Parallelization 

We  implement  the  parallel  method  for  our  hybrid  solver  on  a  shared  memory  machine  with 
multiple  cores  using  multithread  technique.  With  user  suggested  number  of  multiple-threads,  we 
will  create  N  threads  (N  =  min(M,  min(0,  P)),  where  M  is  the  number  of  subdomains,  O  the 
number  of  user  suggested  threads,  and  P  the  number  of  available  CPU  cores).  Using  this 
formulation,  we  can  limit  that  there  is  only  one  thread  running  on  one  core  at  most;  and  eliminate 
the  time  on  thread  switching  to  increase  the  performance.  For  each  thread,  we  will  assign  some 
subdomains,  depending  on  how  we  balance  the  tasks  on  each  thread,  into  each  thread’s  job-list. 
There  are  3  types  of  solvers  for  current  hybrid  method,  FDTD,  FETD  and  SETD.  Each  sub-domain 
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can  be  solved  by  any  one  of  these  3  solvers.  Here,  we  illustrate  how  to  parallelize  one  case  with 
nine  sub-domains,  which  has  3  FDTD  sub-domains,  3  FETD  sub-domains  and  3  SETD 
subdomains. 


FDTD 

FDTD 

FDTD 

FETD 

FETD 

FETD 

SETD 

SETD 

SETD 

Fig.  3.2:  A  computational  domain  with  3  FDTD  subdomains,  3  FETD  subdomains  and  3  SETD 
sub-domains. 


We  assume  3  threads  will  be  used  in  the  simulation  for  the  case  in  Fig  3.2.  We  will  assign  one 
FDTD,  one  FETD  and  one  SETD  subdomain  for  each  thread.  The  general  scheme  for  the  parallel 
computing  of  the  hybrid  multidomain  method  is  as  followed.  At  first,  we  update  FDTD 
sub-domains  simultaneously.  Then  pass  the  EM  field  from  FDTD  sub-domains  to  adjacent  FETD  or 
SETD  sub-domains.  Next,  we  update  FETD  sub-domains  or  SETD  sub-domains  simultaneously 
(the  running  sequence  of  FETD  and  SETD  solver  can  be  arbitrary)  and  exchange  EM  field  between 
FETD  sub-domains  or  SETD  sub-domains.  Finally,  we  need  to  pass  EM  field  from  FETD 
sub-domains  or  SETD  sub-domains  to  FDTD  sub-domain. 


3.3.  Multidomain  Balance 

In  general,  different  domain  has  different  number  of  unknowns,  and  even  for  the  same  number  of 
unknowns,  different  solver  requires  different  solving  time.  Moreover,  sometimes  the  number  of 
subdomains  is  much  larger  than  the  number  of  threads.  Waiting  time  among  different  threads  is 
necessary  to  ensure  all  the  subdomains  are  updated  in  the  same  pace.  To  reduce  the  waiting  time,  we 
balance  the  load  on  each  thread  to  let  them  be  able  to  finish  approximately  at  the  same  time.  In  the  load 
balance  method,  we  roughly  assume  that  the  subdomain  solving  time  is  proportional  to  the  number  of 
unknown  in  the  subdomain.  An  example  is  given  as  follows. 

Assume  that  2  threads  need  to  balance  and  the  computational  load  is  10  subdomains  whose  solving 


32 


Wave  Computation  Technologies,  Inc.,  Contract  W911NF-09-C-0159 


times  are  evaluated  as  1  to  10.  As  shown  in  Fig.  3.3,  we  firstly  reorder  the  load  to  a  list  with  a 
descending  order.  Then  we  assign  the  first  load  of  the  list  to  a  minimal  running  time  thread,  remove  the 
load  from  list,  and  sum  the  thread  running  time  as  the  loads  assigned  to  it.  We  repeat  this 
assign-remove-resume  procedure  until  the  job  list  becomes  empty.  It  can  be  seen  in  Fig.  3.3  that  the 
final  running  time  of  two  threads  is  28  and  27,  which  means  that  the  load  is  almost  evenly  assigned  to 
the  threads. 
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(c)  After  2  assign-remove-resumes. 


(d)  After  3  assign-remove-resumes 
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(e)  After  5  assign-remove-resumes 


(f)  End  of  balance 


Fig  3.3:  Balance  10  subdomains  on  2  threads. 


3,4.  Performance  Speedup 


An  example  is  used  to  test  the  performance  speedup  of  the  parallel  computation.  As  shown  in 
Fig.  3.4,  a  multidomain  with  56  SETD,  3  FETD  and  1  FDTD  sub-domains  is  studied.  We  evaluate 
the  performance  speedup  with  1,  2,  3  and  4  threads  respectively. 


SETD 

FETD 

FETD 

FDTD 

FETD 

Fig.  3.4:  An  example  of  multidomain  with  56  SETD,  3  FETD  and  1  FDTD  subdomains. 
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Thread  Number 

Fig.  3.5:  Simulation  time  with  different  number  of  threads  for  the  example  in  Fig.  3.4. 


Fig.  3.6:  Performance  speedup  factor  of  parallel  computation  for  different  number  of  threads. 
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The  speedup  factor  is  shown  as  Fig  3.6  where  the  speedup  factor  is  not  ideally  linear  with  the 
thread  number.  They  are  two  reasons.  The  first  is  that  we  need  to  run  FDTD  domain  firstly,  then 
run  FETD  and  SETD  domain  because  of  the  different  time  integration  methods  between  the  FDTD 
and  FETD/SETD  methods.  The  second  is  come  from  the  hardware  limitation.  We  use  a  computer 
with  Intel  Q6600  CPU  and  4  GB  DDR2-800  memories.  The  memory  bandwidth  limits  the  system 
speedup.  A  similar  phenomenon  has  been  also  observed  in  our  parallelized  FDTD  method. 


3.5.  Numerical  Results 

To  verity  the  hybrid  method  using  the  parallel  algorithm,  a  homogeneous  rectangular  cavity 
model,  which  includes  a  SETD  domain,  a  FETD  domain,  and  ECT  domains,  is  presented  as  shown 
in  Fig.  3.7.  This  cavity  has  a  dimension  of  2.5  mx2mxl.5m  and  has  5x4x3  subdomains.  A 
dipole  source  with  the  maximum  frequency  of  100  MHz  is  at  (0.2,  0.2,  0.2)  m  and  an  observer  is 
located  at  (1.2,  1.1,  0.8)  m.  The  computed  electric  field  at  the  observer  is  compared  with  that  by 
using  a  pure  FDTD  method  with  40  PPW,  as  shown  in  Fig.  3.8,  where  a  good  agreement  is  obtained. 
The  CPU  time  of  this  cavity  is  110  s  by  1  core,  85  s  by  2  cores,  59  s  by  3  cores,  and  57  s  by  4  cores. 


Fig.  3.7:  Rrectangular  cavity  model.  Lower  corner  is  (0,  0,  0)  m  and  upper  corner  is  (2.5,  2.5,  2). 
SETD  domain  is  located  at  [4,  2,  2],  FETD  domain  at  [2,  3,  2],  and  the  others  are  FDTD  domains. 
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Fig.  3.8:  Comparison  of  electric  field  at  (1.2,  1.1,  0.8)  m. 


The  second  example  is  an  inhomogeneous  case  with  a  curved  geometry.  As  shown  in  Fig.  3.9,  a 
ring  structure  has  a  dielectric  constant  of  9.8.  A  BHW  plane  wave  propagating  in  (45,  45)  degree  is 
incident  and  an  observer  is  put  at  (150,  0,  20)  mm.  The  ring  structure  is  discretized  by  a  tetrahedron 
mesh  and  FETD  method  is  applied  at  this  subdomain.  To  validate  the  results,  the  electric  field  at 
the  observer  is  compared  with  that  by  using  pure  FDTD  method  with  a  high  sampling  density.  The 
comparison  is  shown  in  Fig.  3.10,  where  a  good  agreement  is  obtained.  The  CPU  time  is  2394  s  by 
using  1  core,  1496  s  by  using  2  cores,  1271  s  by  using  3  cores,  and  1199  s  by  using  4  cores. 
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Fig.  3.9:  A  dielectric  ring  model.  The  ring’s  height  is  39  mm,  inner  radius  is  16.54  mm,  and  outer 
radius  is  26.75  mm. 


Observer  Ez  transient 
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Fig.  3.10:  Comparison  of  electric  fields  at  the  observer  by  hybrid  method  and  pure  FDTD. 
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4.  PARALLEL  COMPUTATION  by  GPU 

Up  to  date,  General-Purpose  computation  on  Graphics  Processing  Units  (GPGPU)  is 
becoming  a  popular  technology  to  speed  up  the  task  with  high  computational  density.  The  basic 
idea  is  that  the  multiple  identical  execution  cores  in  a  Graphics  Processing  Unit  (GPU)  can  execute 
the  instructions  in  parallel.  If  a  task  can  be  divided  into  multiple  sub-tasks,  each  sub-task  can  be 
individually  delivered  to  a  GPU  execution  core  and  then  all  sub-tasks  can  be  executed  in  parallel. 
Thus,  the  total  time  cost  for  this  task  will  be  reduced.  For  an  FDTD  method,  due  to  the  localization 
of  the  memory  and  operations,  it  can  fully  take  advantage  of  this  kind  of  parallelization  and  obtain 
very  high  speed  up. 

4.1.  Platforms 

There  are  several  types  of  GPGPU  platforms  and  programming  languages  available.  The  most 
direct  development  platforms  come  from  the  GPU  manufactures.  The  NVidia  Corp.  creates  the 
Cuda  platform  based  on  their  GPU;  and  the  AMD  Corp.  creates  the  Stream  platform  based  on  their 
GPU.  These  two  platforms  cannot  work  cross-platform.  Microsoft  also  released  their  GPGPU 
API  in  Directxll;  it  is  compatible  with  NVidia’s  GPU  and  AMD’s  GPU.  The  highest  level 
programming  platform  is  OpenCL  (Open  Computing  Language).  It  is  a  framework  for  writing 
programs  that  execute  across  heterogeneous  platforms  consisting  of  CPUs,  GPUs,  and  other 
processors.  Currently,  NVidia,  AMD  and  Microsoft  all  provide  API  to  port  their  private  GPGPU 
API  to  OpenCL. 

All  these  development  platforms  have  their  advantages  and  disadvantages.  For  Cuda,  it  can 
develop  fastest  codes  today,  but  it  can  only  work  on  the  NVidia’s  hardware.  For  Stream,  the 
performance  is  less  than  Cuda.  The  performance  of  GPGPU  API  in  Direct  11  is  the  slowest  one 
compared  with  Cuda  and  Stream,  but  it  has  more  compatible  capability.  The  OpenCL  has  the  best 
compatible  capability  among  these  platforms.  But  the  standard  of  OpenCL  is  just  established  last 
year,  and  there  are  not  enough  documentations  and  programming  references  on  it. 

By  comparing  all  the  available  techniques,  we  choose  NVidia’s  Cuda  development  platform  as 
our  GPGPU  speedup  engine. 

4.2.  GPU  acceleration  method  on  CUDA  Engine 

In  ECT  (a  boundary  conformal  FDTD  method),  the  updating  scheme  of  the  electric  field 
depends  only  on  the  field  itself  and  adjacent  magnetic  fields.  Therefore,  all  electric  field 
components  can  be  updated  in  parallel  if  a  system  has  independent  memory  for  electric  field  and 
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the  magnetic  field  components.  Similarly,  the  magnetic  field  components  have  the  same  property. 
Thus,  it  makes  the  FDTD  method  ideal  for  GPU  acceleration.  We  have  implemented  GPU 
acceleration  based  on  the  Nvidia  CUDA  engine.  This  section  introduces  our  acceleration  method  in 
two  aspects:  the  GPU  acceleration  method  and  the  performance. 

In  CUDA  engine,  a  parallel  code  can  be  run  on  a  grid  with  AxB  blocks.  Each  block  has  CxD 
parallel  threads.  For  a  graphic  card  with  M  CUDA  cores,  we  can  make  CxDxK=M,  where  K 
=[  2'n  ...  2n]  and  n  and  M  are  integer  numbers.  Therefore,  a  parallel  code  can  be  finished  on  the 
grid  by  AxB/K  runs.  For  a  typical  middle-end  Nvidia  GPU,  for  example,  9600  GSO,  the  number  of 
CUDA  cores  can  be  around  100.  Thus,  a  parallel  code  on  CUDA  engine  can  obtain  significantly 
speedup.  For  the  electric  and  magnetic  fields  in  an  FDTD  computation  domain,  they  are  organized 
as  N  =  xxyxz  cells.  In  our  parallelized  GPU  code,  we  organized  CUDA  cores  by  a  grid  with  ceil 
(x/12)  x  ceil  (yt 8)  blocks  and  each  block  has  12x8=96  threads.  For  each  thread,  it  will  update  the 
field  along  z  axis  only.  Therefore,  our  code  has  approximate  (9 6/M)  /  ceil(9 6/M)  x  M 
parallelization  in  field  updating. 

Fig.  4.1  shows  the  flow  chart  of  the  GUP  acceleration  implementation  in  our  FDTD  solver. 
Another  consideration  is  the  effect  of  the  data  transfer  between  the  computer  system  memory  and 
GPU  local  memory.  The  data  transfer  includes  the  source  excitation  and  field  fetching.  The  CUDA 
code  can  only  run  on  a  GPU  and  operate  on  the  GPU  local  memory.  For  a  realistic  EM  simulator,  it 
is  necessary  to  add  source  excitation  on  the  fields  at  each  EM  updating  time  step.  CUDA  core  is 
designed  for  simple  calculation  and  the  CUDA  engine  can  only  support  limited  math  functions.  But 
the  excitation  pulse  in  Wavenology  EM  package  supports  a  combination  of  complicated  math 
functions,  some  of  them  are  not  supported  by  CUDA  engine.  Therefore,  the  excitation  at  each  time 
step  must  be  calculated  serially  on  CPU  and  then  transferred  to  GPU,  thus  decreasing  the  speedup 
performance.  In  addition,  Wavenology  EM  package  needs  to  record  the  EM  field  at  every  step  or  at 
every  several  steps.  As  mentioned  above,  the  fields  are  stored  in  GPU  memory  only.  Thus,  data 
transfer  from  GPU  memory  to  system  memory  is  required  also.  Due  to  the  fact  that  the  data  transfer 
speed  between  system  memory  is  much  slow  than  regular  CPU-system  memory  transfer  speed,  it 
will  decrease  the  speedup  performance  also.  Because  of  these  two  factors,  the  speedup  factor  of  the 
GPU  implementation  will  be  understandably  lower  than  the  number  of  threads;  nevertheless,  our 
performance  study  below  shows  that  the  speedup  is  quite  good. 
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4.3.  GPU  Acceleration  Examples 

We  show  the  performance  of  the  GPU  acceleration  in  Wavenology  EM  package.  In  the 
simulation,  we  use  an  Nvidia  9600GSO  graphic  card,  which  has  384  MB  local  memory,  and  96 
CUDA  cores.  The  maximum  number  of  threads  that  can  be  created  on  GPU  is  65535  and  the 
maximum  number  of  threads  in  each  block  is  512.  The  CPU  core  clock  is  1375  MHz  and  the  GPU 
memory  clock  is  800  MHz.  The  CPU  system  has  a  4-core  CPU  Intel  Q6600  with  DDR2-800 
memory.  As  mentioned  above,  in  our  current  implementation,  we  fix  the  GPU  block  as  12x8 
threads  per  block.  We  use  a  single  domain  cavity  case  to  compare  with  a  benchmark  case:  the 
Nvidia  CUDA  example  FDTD3D.  This  example  implements  a  finite-differential-time-domain 
method  on  single  field  propagation  in  a  3D  space.  We  consider  a  cavity  model  which  includes  two 
electric  dipole  sources  and  three  observers  to  record  the  signal.  The  computation  domain  is  divided 
into  100x100x100  and  280x280x16  cells,  respectively.  The  fields  are  recorded  at  every  time  step. 

The  speedup  performances  for  the  cavity  cases  and  CUDA  example  FDTD3D  are  shown  in  the 
Table  4.1. 
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Table  4.1:  The  speedup  performance  comparison 


FDTD  cavity 

FDTD  cavity 

Nvidia  CUDA  example 

Case  Name 

100x100x100 

280x280x16 

FDTD3D 

cells 

cells 

280x280x280  cells 

Speedup  factor 

20.2 

22.4 

40.1 

As  shown  in  the  table  4.1,  the  GPU  speedup  factor  in  the  FDTD  solver  of  Wavenology  EM  is 
about  20,  while  the  CUDA  example  FDTD3D  is  40.  It  is  reasonable  because  of  the  following 
aspects: 

1.  Example  FDTD3D  does  not  have  a  source  which  needs  to  be  calculated  in  the  CPU  and 
then  transfer  to  GPU. 

2.  Example  FDTD3D  does  not  need  to  read  field  from  GPU  at  every  time  step. 

3.  Updating  of  the  FDTD  is  more  complicated  than  the  example  FDTD3D.  The  FDTD  needs 
to  update  6  field  values  by  21  memory  locations,  while  the  FDTD3D  only  needs  to  update 
one  field  by  6  memory  locations. 

Table  4.1  also  shows  that  larger  x*y  case  has  better  speedup  performance  in  our  implementation: 
the  second  case  with  280x280x16  cells  has  a  speedup  factor  of  22.4,  while  the  first  case  with 
100x100x100  cells  has  a  speedup  factor  of  20.2. 

4.4.  Implementation  of  PML  subdomain 

As  we  demonstrated  the  performance  of  the  GPU  acceleration  for  cavity  cases  with  electrical 
dipole  sources  and  observers.  We  found  there  are  two  factors  that  affect  the  performance  of  the 
GPU  acceleration.  One  is  the  data  transfer  between  the  GPU  and  the  system  main  memory.  Another 
is  the  mesh  size  in  XY  plane.  In  this  period,  we  have  evaluated  the  effect  of  additional  PML 
sub-domains  for  the  GPU  acceleration.  PML,  which  is  a  sub-domains  as  absorbing  boundaries,  has 
10  layers  in  its  direction. 

Due  to  10  layers  for  each  PML  sub-domain,  those  4  small  subdomains  of  PMLs  have  10  x  10  x 
10  cells,  which  do  not  match  our  GPU  block  size  (12  x  8  in  XY  plane),  and  thus  limits  the 
performance.  Fig.  4.2  shows  the  flow  chart  for  time  stepping  in  the  GPU  acceleration.  The  PML  is 
implemented  separately  in  26  subdomains.  The  performance  of  whole  stepping  is  affected  by  the 
performance  of  each  sub  domains.  Therefore,  the  small  PML  sub  domains  will  decrease  the  GPU 
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acceleration  performance. 


Fig.  4.2:  Flow  chart  of  sequential  time-stepping  of  domains. 

We  found  that  our  implementation  on  the  GPU  acceleration  requires  a  modification  in  the 
stepping  scheme  to  reduce  the  impact  of  data  transfers  between  the  GPU  and  the  CPU  system  main 
memory.  It  is  possible  to  take  advantage  of  multiple-streams  technology  in  CUDA  to  step  PML 
subdomains  in  parallel  to  reduce  the  time  in  PML  stepping,  as  shown  in  Fig.  4.3. 


Field  Stepping 


Fig.  4.3:  Flow  chart  of  time  stepping  for  multiple-streams  domain. 


4.5.  Examples  of  PML  subdomain 


This  first  example  is  a  50  Q  strip  line  which  is  composed  of  a  PEC  ground,  a  RT  Duroid 
substrate  (er=2.2,  thickness  0.794  mm)  and  a  PEC  patch  with  width  of  2.413  mm  as  shown  in  Fig. 
4.3.  A  lumped  port  (7)  is  connected  to  a  terminal  of  strip  line  as  an  excitation.  A  50  f2  resistor  (7?7) 
is  attached  at  the  other  terminal  of  strip  line  as  a  load.  The  boundary  conditions  of  this  case  are 
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PML  except  zmin  position  where  PEC  boundary  condition  is  applied. 


Fig.  4.3:  Configuration  of  the  strip  line. 


Due  to  the  fact  that  there  are  only  one  source  and  one  data  point  to  be  recorded  in  this  example, 
the  data  transfer  between  the  GPU  and  the  CPU  system  main  memory  is  minimized.  Therefore,  the 
performance  of  GPU  speedup  will  be  dominated  by  the  mesh  size  in  XY  plane  and  the  PML 
sub-domains.  We  use  different  mesh  resolutions  to  evaluate  the  performance  of  the  GPU 
acceleration.  The  GPU  speedup  factor  is  defined  as  the  ratio  between  the  CPU  time  in  the  serial 
code  and  that  in  the  GPU  accelerated  code.  The  result  of  speedup  factor  for  this  example  is  shown 
in  table  4.2.  We  achieve  a  factor  of  12.1  speedup  using  the  GPU  for  this  example  when  the 
number  of  cells  is  100x100x80. 


Table  4.2:  Comparison  of  speedup  performance  for  the  strip  line  case 


Main  Computation 

Domain  Mesh  size 

8x12x12  cells 

30x30x30  cells 

100x100x80  cells 

Speedup  factor 

0.7 

5.2 

12.1 

From  the  table  4.2,  the  performance  of  the  GPU  code  is  better  as  the  size  of  mesh  increases. 
The  speedup  factor  of  the  Cavity  case  is  about  20,  while  the  strip  line  is  only  about  12.  The  main 
reason  for  this  is  the  usage  of  PML  sub-domains.  In  our  GPU  code,  we  step  all  domains 
sequentially.  Therefore,  a  one-domain  with  small  mesh  size  can  impact  the  performance  of  the 
whole  code. 

The  second  example  is  an  ultra  wideband  antenna  which  is  fed  by  a  lumped  port  (7)  and  is 
working  in  an  open  space  as  shown  in  Fig.  4.5.  We  put  several  observers  to  record  near  fields.  All 
boundary  conditions  of  this  case  are  PML  and  thus  this  model  includes  a  main  domain  and  26  PML 
sub-domains. 
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observers 


Fig.  4.5:  Configuration  of  the  ultra  wideband  antenna. 

Due  to  more  data  transfers  between  the  GPU  and  the  CPU  system  main  memory  and  more 
PML  sub-domains,  the  performance  of  GPU  speedup  will  be  lower  than  the  strip  line  case.  Here, 
we  use  different  mesh  resolution  and  different  number  of  observers  to  evaluate  the  performance  of 
the  GPU  acceleration.  The  resultant  speedup  is  shown  in  the  table  4.3. 


Table  4.3:  Comparison  of  speedup  performance  for  the  wideband  antenna 


Main  Computation 

Domain  Mesh  size 

51x72x54  cells 

13  observers 

104x102x78  cells 

5  observers 

104x102x78 

cells  13 

observers 

Speedup  factor 

8.9 

10.4 

9.7 

The  GPU  speedup  is  about  10.  Due  to  the  large  of  XY  mesh  we  can  get  better  performance.  But  the 
number  of  data  transfers  between  the  GPU  and  the  system  main  memory  will  decrease  the  speedup 
performance. 
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5.  COMPREHENSIVE  TESTINGS 

Our  field/circuit  software  package  combines  three  efficient  electromagnetic  field  algorithms, 
the  SETD  for  coarse  scales,  the  ECT  (an  improved  conformal  FDTD  method)  for  intermediate 
scales,  and  the  FETD  for  fine  scales,  together  with  nonlinear  circuit  solvers  (both  SPICE  circuit 
solver  and  our  internal  circuit  solver)  to  solve  complex  multiscale  problems.  A  number  of  cases  are 
tested  to  verify  and  validate  the  accuracy  and  efficiency  of  our  field/circuit  software  package. 

5.1.  Plane  Wave  Incident  on  a  Large  Domain 

•  Purpose:  Verify  the  high  accuracy  and  efficiency  of  the  DG-SETD  method  in  simulating 
large-scale  problems. 

•  Model:  The  geometry  of  the  case  has  a  large  dimension  of  10  wavelengths  along  all  the  three 
Cartesian  directions.  A  simple  homogeneous  model  with  a  plane  wave  incidence  is  employed  to 
examine  the  accuracy  of  the  numerical  results,  since  the  analytical  solution  for  such  a  model  is 
known. 

•  Result:  DG-SETD  is  much  efficient  than  FDTD  in  both  required  memory  and  CPU  time  as 
shown  in  Table  5.1  and  Table  52. 


Table  5.1:  Comparison  between  DG-SETD  and  FDTD  at  1%  error  tolerance 


Mesh  Size 

Memory  (MB) 

CPU  Time  (s) 

FDTD 

300  x  300  x  300 

3,844 

1,522 

DG-SETD 

63  x  63  x  63 

270 

683 

Gain  Ratio  by  DG 

108 

14.2 

2.2 

Table  5.2:  Comparison  between  DG-SETD  and  FDTD  at  0.1%  error  tolerance 


Mesh  Size 

Memory  (MB) 

CPU  Time  (s) 

FDTD 

600  x  600  x  600 

20,656 

18,309 

DG-SETD 

77  x  77  x  77 

380 

2,297 

Gain  Ratio  by  DG 

473 

53.3 

8.0 

46 


Wave  Computation  Technologies,  Inc.,  Contract  W911NF-09-C-0159 


Fig.  5.1:  The  mesh  of  the  FDTD  method  for  the  10  x  10  x  10  wavelength  problem.  It  has  300  cells  in 
each  direction,  requiring  30  points  per  wavelength  to  get  99%  accuracy  (1%  error). 


Fig.  5.2:  The  mesh  of  the  DG-SETD  method  for  the  10  x  10  x  10  wavelength  problem.  It  has  9 
elements  in  each  direction.  For  each  element,  the  7th  order  SETD  basis  functions  are  used.  Generally, 
it  requires  only  6.3  points  per  wavelength  to  get  99%  accuracy. 


5.2.  Scattering  of  a  Tilted  Thin  PEC  Plate  to  an  Electric  Dipole  Wave 


•  Purpose:  Test  the  accuracy  and  performance  of  the  hybrid  code  in  dealing  with  highly  fine 
details  in  an  intermediate  sized  problem. 
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•  Model:  A  thin  tilted  PEC  plate  oriented  along  (1,  1,  1)  direction  is  used  to  represent  the  highly 
fine  details.  The  thickness  of  the  PEC  plate  is  3  mm,  only  0.006  wavelengths  relative  to  the 
maximum  frequency  (600  MHz).  An  electric  dipole  aligned  along  the  z  direction  is  placed  at 
(250,  100,  250)  mm.  Three  receivers  are  placed  at  the  positions  (250,  400,  150),  (250,  800,  250), 
and  (250,  1200,  250)  mm,  respectively. 

•  Result:  The  hybrid  method  is  46.4  times  more  efficient  than  the  pure  FDTD  method  in 
computational  memory  and  1.5  times  faster  than  the  FDTD  method,  while  Good  agreement  is 
obtained  with  the  FDTD  results. 


Fig.  5.3:  Scattering  of  a  Tilted  Thin  PEC  Plate  to  waves  generated  by  an  Electric  Dipole  polarized 
along  the  z  direction.  The  model  includes:  (1)  a  tilted  thin  PEC  plate  (yellow  region)  at  the  center;  (2) 
an  electric  dipole  (red  dot)  at  the  left;  and  (3)  three  receivers  (grey  dots),  in  front  of,  close  to,  and  after 
the  plate,  respectively. 


Table  5.3:  Comparison  between  hybrid  FETD/FDTD  and  FDTD  method  for  PEC  plate  case 


Mesh  Size 

Memory  (MB) 

CPU  Time  (s) 

FDTD 

235  x  245  x  232 

18,410 

6,510 

Hybrid  FETD/FDTD 

33  x  17  x  17  +  9,345 

397 

4,290 

Gain  Ratio  by  Hybrid 

707 

46.4 

1.5 
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5.3.  A  Patch  Antenna  in  a  Reverberation  Chamber 

•  Purpose:  Test  the  accuracy  and  efficiency  of  the  hybrid  FETD/SETD  method. 

•  Model:  A  patch  antenna  is  measured  in  a  reverberation  chamber  with  a  metallic  stirrer.  The 
ratio  of  the  largest  dimension  to  the  smallest  dimension  in  this  case  is  1.6m  /  2  mm  =  800, 
which  means  a  typical  multi-scale  structure.  The  subdomains  containing  the  antenna  and  the 
stirrer  are  discretized  by  finite  elements  with  dense  meshes  (first  order  tetrahedral  mesh  with  a 
sampling  density  equal  to  20  PPW),  and  all  the  other  subdomains  are  discretized  by  higher 
order  spectral  elements  with  coarse  meshes  (fifth  order  hexahedral  mesh  with  a  sampling 
density  equal  to  10  PPW). 

•  Result:  Table  5.4  lists  the  comparison  of  computational  costs  by  the  two  methods  from  which 
we  observe  that  the  hybrid  SETD/FETD  method  is  more  efficient  than  the  conventional  FDTD 
method  for  such  multiscale  electromagnetic  simulations. 


Fig.  5.4:  A  patch  antenna  measured  in  a  reverberation  chamber.  The  size  of  this  chamber  is  1.6  m  x 
1.2  m  x  0.8  m.  The  thickness  of  the  metallic  stirrer  and  the  patch  antenna  are  5  mm  and  2  mm, 
respectively. 


Table  5.4:  Comparison  of  hybrid  and  FDTD  method 


Memory  (MB) 

CPU  Time  (h) 

FDTD 

2,560 

5.7 

Hybrid  FETD/SETD 

730 

4.2 

Gain  Ratio  by  Hybrid 

3.5 

1.4 
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5.4.  A  Tilted  Parallel  Wire  Transmission  Line  Operated  by  Lumped  Ports 

•  Purpose:  Test  the  integration  of  the  hybrid  method  with  circuit  elements. 

•  Model:  A  tilted  thin  parallel  wire  transmission  line  is  operated  by  two  lumped  ports.  One  of  the 
lumped  ports  is  used  to  excite  the  transmission  line,  while  the  other  one  is  used  to  receive  the 
signals  propagating  through  the  transmission  line.  The  thickness  of  the  wires  is  1  mm,  about  0.0033 
wavelengths  with  respect  to  the  excited  maximum  frequency.  This  transmission  line  problem  has 
multiple  scales:  the  fine  scales  around  the  thin  wires  and  the  coarse  scales  cover  the  open  air  area. 
To  solve  the  multiscale  problem,  a  hybrid  FETD/FDTD  method  is  applied.  At  the  center  around  the 
thin  wires,  an  FETD  subdomain  is  placed.  Elsewhere,  FDTD  subdomains  are  placed. 

•  Result:  The  unstructured  mesh  is  much  more  efficient  in  capturing  arbitrarily  shaped  small  details 
than  the  structured  Cartesian  mesh  used  by  the  FDTD  method.  It  requires  much  fewer  elements. 
The  simulation  by  the  hybrid  method  takes  less  than  2  hours  and  requires  only  729  MB  memory. 


Fig.  5.5:  A  tilted  thin  parallel  wire  transmission  line  is  connected  to  two  lumped  ports.  The  port 
denoted  by  a  red  arrow  is  used  to  excite  the  transmission  line,  while  the  port  denoted  by  a  blue  arrow 
is  used  to  receive  the  signals  propagating  through  the  transmission  line. 
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Fig.  5.6:  FETD  mesh  for  hybrid  method.  The  number  of  tetrahedron  is  8226  and  the  number  of 
unknown  is  64,597. 


Fig.  5.7:  This  total  domain  is  divided  into  3x3x3  subdomains.  At  the  center  around  the  thin  wires, 
an  FETD  subdomain  is  placed.  Elsewhere,  FDTD  subdomains  are  placed. 
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Fig.  5.8:  The  Cartesian  FDTD  mesh  requires  about  408  x  484  x  408  cells  to  capture  the  thin  wires. 

5.5.  Wave  propagation  in  a  Room 

•  Purpose:  Show  the  high  efficiency  of  the  SETD  method  for  large-scale  models  as  a  real 
application. 

•  Model:  The  problem  is  a  wireless  communication  in  a  room  to  simulate  the  wave  propagation 
characteristics  of  the  walls  and  PEC  pipes.  All  walls  are  modeled  by  a  dielectric  material  (sr  = 
2.4)  and  6  PEC  pipes  located  vertically.  An  incident  plane  wave  source  with  1.2  GHz  of 
maximum  frequency  is  used,  thus  the  maximum  problem  size  will  be  about  19.4  wavelengths 
for  each  direction.  The  plane  wave  time  function  is  the  first  derivative  of  the  Blackman-Harris 
window  function. 

•  Result:  The  SETD  method  is  51  times  less  than  FDTD  method  in  memory  requirement  and 
1.28  times  faster  in  CPU  time  as  shown  in  Table  5.5. 

Table  5.5:  Cost  Comparison  between  SETD  and  FDTD 


Number  of  unknowns 

(Million) 

Memory  (GB) 

CPU  Time 

SETD 

6.58 

0.53 

5h  50m 

FDTD 

1,213 

27 

7h  27m 
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Fig.  5.9:  The  model  of  room  in  a  wireless  communication  setting  where  all  walls  are  modeled  as  a 
dielectric  material  with  PEC  pipes  located  along  the  z  direction.  Bounding  box  of  this  model  is  (-1.6, 
-1.6,  -1.6)  m  (lower  corner)  to  (1.6,  1.6,  1.6)  m  (upper  corner).  The  plane  wave  is  polarized  in  the 
z-direction  and  propagates  along  the  x-direction.  An  observer  is  located  at  (-1.2,  0.3,  -0.5)  m  to 
calculate  fields. 


5.6.  A  Communication  System 

•  Purpose:  Show  the  capability  of  Wavenology  EM  in  simulating  a  communication  system  with 
modulation  and  demodulation  units  by  the  ECT  field  solver  and  SPICE  circuit  solver  inside  our 
software. 

•  Model:  This  system  consists  of  two  parts:  one  is  a  transmitter  using  a  patch  antenna  and  the 
other  is  a  receiver  using  a  monopole  antenna.  The  observed  electric  field  at  (0,  250,  20)  mm, 
located  at  the  center  of  the  computational  domain  where  we  can  observe  the  waveform 
including  both  carrier  and  signal  generated  at  the  transmitter. 

•  Result:  We  confirm  that  the  ECT  combined  with  the  nonlinear  SPICE  solver  can  model  and 
analyze  the  communication  system  very  conveniently  within  one  single  simulation.  Other 
similar  systems  with  complicated  circuits  can  be  also  analyzed  by  our  software. 
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Fig.  5.10:  Model  of  the  communication  system  including  transmitter  (left)  and  receiver  (right).  The 
substrate  in  the  transmitter  has  a  dimension  of  76  mm  x  1.5875  mm  x  79.5  mm  and  a  dielectric 
constant  of  4.24.  The  receiver  is  located  about  500  mm  away  from  the  transmitter. 


*9  Rd  Ld  D1 


Fig.  5.11:  SPICE  circuits  in  the  communication  system  in  Fig.  5.10.  (a)  Modulation  circuit  denoted  as 
Circuitl  and  Circuit2  in  the  transmitter  in  Fig.  5.10.  (b)  Demodulation  circuit  denoted  as 

“ampdemod”  in  the  receiver  in  Fig.  5.10. 
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Fig.  5.12:  Demodulated  signal  filtered  by  the  demodulator  in  the  receiver  in  Fig.  5.10. 


5.7.  A  Patch  Antenna  Modulated  by  Dual  Memristors 

•  Purpose:  Model  the  directly  modulated  microstrip  patch  antenna  system  with  dual  memristors 
[1]  by  our  ECT  with  the  nonlinear  SPICE  solver. 

•  Model:  An  L-band  directly  modulated  microstrip  patch  antenna  is  chosen.  The  dielectric 
constant  of  the  substrate  is  4.24  and  relative  permeability  is  1.  The  area  of  the  square  patch  is 
47.5  x  47.5  mm2,  approximately  equally  to  half-wavelength  in  the  dielectrics.  The  feeding 
point  is  located  at  (24.3,  24.3)  mm,  approximately  quarter-wavelength  in  the  dielectrics,  on  the 
top  patch  plane. 

•  Result:  The  memristive  system  using  a  directly  modulated  patch  antenna  system  with  dual 
memristors  has  been  analyzed  successfully  by  our  ECT  within  Wavenology  EM  with  the 
nonlinear  SPICE  solver.  We  can  expect  that  this  method  is  also  applicable  to  a  variety  of 
mechanisms  underlying  the  direct  modulation  effects. 
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Fig.  5.13:  Dual  memristors  embedded  in  an  L-band  directly  modulated  patch  antenna.  Circuitl  and 
Circuit2  denote  the  memristors. 


0  20  40  340  360  380 

Time  (ns)  Time  (ns) 


Fig.  5.14:  Radiated  field  by  the  memresitor  modulated  patch  antenna  in  Fig.  5.13.  (a)  Calculated 

near  electric  field  Ex  at  (0,  20,  0)  mm.  This  field  is  a  combined  nonlinear  response  of  a  1.455  GHz 
sinusoidal  carrier  wave  and  a  100  MHz  baseband  signal,  (b)  Calculated  far  field  Ex  at  (0,  0,  200)  mm. 


5,8,  Antenna  Array 

•  Purpose:  Show  the  capability  of  Wavenology  EM  in  modeling  complicated  structures 
including  antenna  arrays. 

•  Model:  The  considered  patch  antenna  array  is  fabricated  on  an  imaging  chamber.  The  structure 
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of  patch  antenna  array  introduces  an  inhomogeneous  background  medium.  The  substrate  is  FR4 
whose  sr  is  equal  to  4.9  and  its  thickness  is  1.6  mm.  The  medium  beyond  the  patch  is  a 
matching  medium,  acetone.  There  are  eight  interlacing  antennas  on  each  side  panel.  All  the 
eight  antennas  share  a  ground.  The  four  side  walls  are  fabricated  with  identical  antenna  array, 
totally  32  antennas.  The  bottom  is  sealed  by  a  single  layer  copper  PCB  board,  while  the  top  is 
open.  The  chamber  is  filled  with  acetone,  the  matching  medium  to  image  biological  tissue  such 
as  the  female  breast. 

•  Result:  The  example  shows  that  the  FDTD  method  can  deal  complicated  model  and  can  be 
used  in  many  applications  such  as  an  inverse  problem  solver. 


Fig.  5.15:  Numerical  model  of  the  patch  antenna  and  chamber.  12  x  12  x  12  of  observers  are  located 
inside  the  chamber  with  4  mm  distance  from  each  other;  32  waveports  at  the  feeds  of  the  antennas  on 
four  side  walls  are  used  as  sources  to  excite  the  fields  sequentially.  Dielectric  spheres  are  target  objects 
to  be  reconstructed  in  an  inverse  problem  solver. 
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Fig.  5.16:  S-Parameters  of  the  patch  antenna  and  chamber  when  port  17  is  used  as  the  source. 


5.9.  Dual-Band  Printed  Electrically  Small  Antenna 

•  Purpose:  This  model  provides  a  dual-band  compact  ESA  unit  suitable  for  the  MIMO 
application  in  smart  and  portable  wireless  devices. 

•  Model:  Fig.  5.17  presents  the  configuration  of  the  dual-band  electrically  small  antenna  with 
two  capacitive  split-ring  resonators  (SRR).  This  ESA  structure  is  fed  by  a  high  performance  50 
Q  coaxial  RF  sub-Miniature  version  B  connector.  The  inner  conductor  of  this  connector  is 
connected  to  the  right  strip  of  the  loop  and  the  outer  pins  of  the  connector  are  directly  soldered 
to  the  left  strip  of  the  loop.  The  ESA  operates  at  dual  frequency  bands.  The  fundamental  idea  is 
that  two  resonant  modes  are  achieved  through  the  field  interaction  among  SRRs  and  the  small 
loop.  In  this  interaction,  two  capacitive  SRRs  play  the  role  of  impedance  matching  to  the  small 
loop. 

•  Result:  Fig.  5.18  shows  the  measured  and  simulated  return  loss  of  the  antenna  where  we  can 
see  that  the  simulated  result  has  a  good  agreement  with  the  measured  one.  The  resonant 
frequencies  are  about  934  MHz  and  1.55  GHz.  The  bandwidth  at  the  lower  band  is  about  3.76% 
at  -8  dB  level  and  the  bandwidth  at  the  higher  band  is  around  2.58%.  Both  bandwidths,  lower 
than  the  bandwidth  of  a  conventional  loop  antenna,  are  subjected  to  the  large  Q-factor  from  the 
strong  self-resonance  of  SRRs. 
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(b)  (c) 

Fig.  5.17:  The  ESA  configuration,  (a)  Perspective  view,  (b)  Geometry  of  the  SRRs  on  the  bottom,  (c) 
Geometry  of  the  loop  on  the  top. 


Fig.  5.18:  |Sn|  comparison  between  measurement  and  simulation,  which  shows  the  operating 
frequencies  are  934  MHz  and  1.55  GHz. 
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5.10.  Simple  Multiple-Channel  Communication 

•  Purpose:  Test  multiple-channel  communication  with  modulation  and  demodulation  units  by  the 
ECT  field  solver  and  SPICE  circuit  solver  inside  our  software.  We  have  tested  a  single-channel 
communication  at  5.6  A  Communication  System  where  the  medium  is  homogeneous.  And  this 
model  will  be  an  extended  version  of  the  previous  case. 

•  Model:  Fig.  5.19  shows  the  model  which  consists  of  soil  on  bottom  and  walls  on  the  other  sides. 
The  configuration  is  similar  to  the  previous  case.  The  transmitter  includes  a  lumped  port  with  a 
carrier  at  frequency  1.455  GHz  and  also  modulators,  denoted  as  Circuitl  and  Circuit2,  which 
generate  a  CW  signal  with  0.1  GHz  frequency. 

•  Result:  Demodulated  signal  filtered  by  the  demodulator  in  the  receiver  is  shown  in  Fig.  5.20  (a) 
and  that  by  the  previous  case  is  as  also  shown  in  Fig.  5.20  (b)  for  comparison.  From  these 
figures  we  can  observe  that  the  magnitude  of  demodulated  signal  is  less  than  that  of  the 
previous  case,  because  this  model  has  multiple  channels  due  to  multiple  reflections. 


Fig.  5.19:  Simple  multiple-channel  communication  room  which  has  a  dimension  of  3.2  m  x  6.2  m  x 
2.0  m. 
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(a) 


Circuit  voltage  [x] 


(b) 

Fig.  5.20:  Demodulated  signal  filtered  by  the  demodulator  in  the  receiver,  (a)  from  this  model,  and  (b) 
from  the  last  model  (5.6  A  Communication  System). 


5.11.  Complicated  Multiple-Channel  Communication 

•  Purpose:  Show  a  complicated  multiple-channel  communication  system  which  is  more  realistic 
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version  of  the  last  model. 

•  Model:  This  example  has  many  furniture  and  many  electronic  devices  which  can  be  modeled 
by  wood  and  PEC,  respectively  as  shown  in  Fig.  5.21.  The  other  configurations  are  the  same  as 
the  last  model.  Due  to  complicate  objects  including  wall  and  soil  traveling  signal  can  be 
reflected  and  decay. 

•  Result:  Fig  5.22  shows  demodulated  signals  filtered  by  the  demodulator  from  this  model  and 
the  last  model.  We  can  observe  that  the  magnitude  of  this  signal  is  less  than  that  of  the  last 
model.  The  reason  is  that  the  reflection  by  objects  in  the  model  is  larger  than  that  in  the  last 
model  and  it  causes  more  decay.  As  seen  in  the  above  results,  we  confirm  that  Wavenology  EM 
combined  with  the  nonlinear  SPICE  solver  can  analyze  or  design  complicated  multiple-channel 
communication  systems. 


Fig.  5.21:  Complicated  multiple-channel  communication  room  which  has  a  dimensions  of  3.2  m  x  6.2 
m  x  2.0  m. 
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Scattered  Voltage  of  Lumped  Port  [x] 


Fig.  5.22:  Demodulated  signal  filtered  by  the  demodulator  in  the  receiver  in  Fig.  5.21. 


63 


Wave  Computation  Technologies,  Inc.,  Contract  W911NF-09-C-0159 


6.  CONCLUSION 

We  have  developed  a  new  multiscale  field/circuit  solver  by  combining  three  efficient  methods:  the 
SETD  method,  FETD  method  and  ECT  method.  We  have  prepared  a  detailed  requirement  analysis  of  a 
software  tool  for  field/circuit  applications  from  the  view  of  point  of  the  users.  And  then  we  have 
designed  an  architecture  for  our  software,  especially  a  simple  multidomain  structure  to  support  several 
different  methods  and  their  coupling  required  by  the  multiscale  hybrid  algorithm.  Based  on  the 
architecture  we  have  completed  the  development  of  the  multi-domain  ECT  engine  for  field/circuit 
simulation  for  a  potential  better  performance  under  parallel  computation  and  a  preparation  for  our 
hybrid  method.  Finally  we  have  completed  the  development  of  the  hybrid  SETD/FETD/ECT/Circuit 
engine  for  multiscale  simulation.  Employing  the  fact  those  multidomain  methods  are  well  suited  to 
parallel  computation,  we  have  completed  the  parallel  computation  for  our  hybrid  method  based  on 
multi-scale  field/circuit  package.  And  we  also  have  completed  the  implementation  of  hardware  (GPU) 
acceleration  method  for  the  package. 

To  verify  our  package,  we  have  completed  comprehensive  tests  of  the  field/circuit  package.  The 
test  cases  include:  (a)  A  plane  wave  incident  on  a  large  domain  to  examine  the  performance  of  the 
DG-SETD  method;  (b)  Scattering  of  a  tilted  thin  PEC  plate  to  an  electric  dipole  wave  to  examine  the 
performance  of  hybrid  FETD/FDTD  method;  (c)  A  patch  antenna  in  a  reverberation  chamber  to  examine 
the  performance  of  the  hybrid  FETD/SETD  method;  (d)  A  tilted  parallel  wire  transmission  line 
connected  lumped  ports  to  examine  the  integration  of  the  hybrid  method  with  circuit  elements,  (e)  Wave 
propagation  in  a  room  as  a  realistic  large  domain  to  examine  the  performance  of  the  SETD  method  and 
compare  its  result  with  the  FDTD  result;  (f)  Communication  system  to  show  a  capability  of  modeling 
with  a  communication  system  with  modulation  and  demodulation  by  FDTD  with  SPICE,  from  the 
beginning  to  the  end;  (g)  A  patch  antenna  modulated  by  dual  memristors,  which  is  the  first  such  system 
to  be  analyzed  by  a  combined  field-circuit  solver,  where  transient  electromagnetic  radiation  is 
modulated  by  a  nonlinear  memristive  system,  to  show  the  capability  of  the  FDTD  field  solver  with  the 
SPICE  circuit  solver;  (h)  A  complicated  3D  antenna  array  for  microwave  imaging;  it  has  complicated 
structures,  numerous  observers,  and  multiple  wave  ports,  to  show  the  usefulness  and  flexibility  of 
Wavenology  EM  solver,  (i)  A  dual-band  printed  electrically  small  antenna  which  is  fundamentally 
important  for  modern  wireless  communication  systems;  (j)  A  simple  multiple-channel  communication 
using  the  ECT  field  solver  and  SPICE  circuit  solver  inside  our  software;  (k)  A  complicated 
multiple-channel  communication  which  shows  Wavenology  EM  combined  with  the  nonlinear  SPICE 
solver  can  analyze  or  design  complicated  multiple-channel  communication  systems. 
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